Reinforcement learning for long-run average cost

نویسنده

  • Abhijit Gosavi
چکیده

A large class of sequential decision-making problems under uncertainty can be modeled as Markov and Semi-Markov Decision Problems, when their underlying probability structure has a Markov chain. They may be solved by using classical dynamic programming methods. However, dynamic programming methods suffer from the curse of dimensionality and break down rapidly in face of large state spaces. In addition, dynamic programming methods require the exact computation of the so-called transition probabilities, which are often hard to obtain and are hence said to suffer from the curse of modeling as well. In recent years, a simulation-based method, called reinforcement learning, has emerged in the literature. It can, to a great extent, alleviate stochastic dynamic programming of its curses by generating near-optimal solutions to problems having large state-spaces and complex transition mechanisms. In this paper, a simulation-based algorithm that solves Markov and Semi-Markov decision problems is presented, along with its convergence analysis. The algorithm involves a step-size based transformation on two time scales. Its convergence analysis is based on a recent result on asynchronous convergence of iterates on two time scales. We present numerical results from the new algorithm on a classical preventive maintenance case study of a reasonable size, where results on the optimal policy are also available. In addition, we present a tutorial that explains the framework of reinforcement learning in the context of semi-Markov decision problems for long-run average cost.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Long Run Marginal Cost based Pricing along with Extended Benefit Factor method for Revenue Reconciliation of Transmission Network in Restructured Power System

Abstract : There are several methods to cover the costs of a transmission system and distribution networks. These methods are divided into either incremental or marginal approaches, which can be either long-term or short-term. The main difference between the incremental and marginal approach is how to calculate the cost of using the network. In the incremental approach, simulation and in the ma...

متن کامل

The Long Run Impact of Technology Diffusion on Average Cost in Upstream Oil Industry; Case Stud of Iran

Literature review related to nonrenewable resources shows that technological improvements have considerable effects on resource depletion and decreasing operational cost.  Therefore it is assumed that technology is the most important and influential variables in the production function and utilization cost of these resources. In this study, we assess the long term effect of technology diffusion...

متن کامل

Semi-Markov decision problems and performance sensitivity analysis

Recent research indicates that Markov decision processes (MDPs) can be viewed from a sensitivity point of view; and perturbation analysis (PA), MDPs, and reinforcement learning (RL) are three closely related areas in optimization of discrete-event dynamic systems that can be modeled as Markov processes. The goal of this paper is two-fold. First, we develop PA theory for semi-Markov processes (S...

متن کامل

Self-Improving Factory Simulation using Continuous-time Average-Reward Reinforcement Learning

Many factory optimization problems, from inventory control to scheduling and reliability , can be formulated as continuous-time Markov decision processes. A primary goal in such problems is to nd a gain-optimal policy that minimizes the long-run average cost. This paper describes a new average-reward algorithm called SMART for nd-ing gain-optimal policies in continuous time semi-Markov decision...

متن کامل

روش نوین قیمت‌گذاری هزینۀ نهایی بلندمدت برای جبران کمبود درآمد شبکۀ انتقال در سیستم قدرت تجدید ساختارشده

The long-run incremental and marginal pricing are two different approaches for pricing transmission and distribution networks usage. The main difference between these two methods is to the way the cost of using the network is calculated. In the former approach, simulations are used, and in the latter, sensitivity analysis methods are used to determine the cost. In this paper, a novel analytical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • European Journal of Operational Research

دوره 155  شماره 

صفحات  -

تاریخ انتشار 2004